Generalised Degrees of Freedom (GDF), as defined by Ye (1998 JASA93:120-131), represent the sensitivity of model fits to perturbations of thedata. As such they can be computed for any statistical model, making itpossible, in principle, to derive the number of parameters in machine-learningapproaches. Defined originally for normally distributed data only, we hereinvestigate the potential of this approach for Bernoulli-data. GDF-values formodels of simulated and real data are compared to model complexity-estimatesfrom cross-validation. Similarly, we computed GDF-based AICc for randomForest,neural networks and boosted regression trees and demonstrated its similarity tocross-validation. GDF-estimates for binary data were unstable andinconsistently sensitive to the number of data points perturbed simultaneously,while at the same time being extremely computer-intensive in their calculation.Repeated 10-fold cross-validation was more robust, based on fewer assumptionsand faster to compute. Our findings suggest that the GDF-approach does notreadily transfer to Bernoulli data and a wider range of regression approaches.
展开▼